Weakly Supervised Learning of Heterogeneous Concepts in Videos

نویسندگان

Sohil Shah

Kuldeep Kulkarni

Arijit Biswas

Ankit Gandhi

Om Deshmukh

Larry S. Davis

چکیده

Typical textual descriptions that accompany online videos are ‘weak’: i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24% relative improvement for pairwise concept classification in the Casablanca dataset and 9% relative improvement for localization in the A2D dataset as compared to the most competitive baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolving ambiguous visual-linguistic cues. Furthermore, dense ...

متن کامل

Semi Supervised Learning in Wild Faces and Videos

We propose an approach for improving unconstrained face recognition based on leveraging weakly labeled web videos. It is easy to obtain videos that are likely to contain a face of interest from sites such as YouTube through issuing queries with a person’s name; however, many examples of faces not belonging to the person of interest will be present. We propose a new technique capable of learning...

متن کامل

Weakly supervised activity analysis with spatio-temporal localisation

In computer vision, an increasing number of weakly annotated videos have become available, due to the fact it is often difficult and time consuming to annotate all the details in the videos collected. Learning methods that analyse human activities in weakly annotated video data have gained great interest in recent years. They are categorised as “weakly supervised learning”, and usually form a m...

متن کامل

Regularized Multi-Concept MIL for weakly-supervised facial behavior categorization

In this work, we address the problem of estimating high-level semantic labels for videos of recorded people by means of analysing their facial expressions. This problem, to which we refer as facial behavior categorization, is a weakly-supervised learning problem where we do not have access to frame-by-frame facial gesture annotations but only weak-labels at the video level are available. Theref...

متن کامل

Weakly supervised learning from images and videos∗

With the amount of on-line available digital content growing daily, large-scale, weakly supervised learning is becoming more and more important. In this talk we present some recent results for weakly supervised learning from images and videos. Standard approaches to object category localization require bounding box annotations of object instances. This time-consuming annotation process is sides...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Weakly Supervised Learning of Heterogeneous Concepts in Videos

نویسندگان

چکیده

منابع مشابه

Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

Semi Supervised Learning in Wild Faces and Videos

Weakly supervised activity analysis with spatio-temporal localisation

Regularized Multi-Concept MIL for weakly-supervised facial behavior categorization

Weakly supervised learning from images and videos∗

عنوان ژورنال:

اشتراک گذاری